The present invention relates to anonymization of information.
In these days, there is an increasing social demand for privacy protection. It is indispensable to take into account privacy in information systems of corporations which treat personal information. Although objects which should be protected and ways of protecting them are not established in social convention, for corporations (businesses handling personal information), it is essential to observe at least legislation which relate to personal information protection (hereinafter, referred to as “privacy legislation”) in various countries. Typical privacy legislation requires corporations to carry out measures necessary for personal information management such as collection and use of personal information.
Some of privacy legislation, such as EU Directive 2002/58/EC and in some extent the HIPAA itself, require anonymizing personal information for service management, except in cases where it is necessary to identify an individual (data subject).
An easy way of processing personal information anonymization is to remove information with which individuals can be identified from personal information of the individuals, or making the information with which individuals can be identified vague. An example of the former processing is processing to remove names and addresses. An example of the latter is processing, for example, to convert addresses to prefecture units or converting ages to intervals of 10 years.
However, even if such processing is performed, it is possible to identify specific individuals from anonymized personal information by collating the personal information with other information which can be acquired concerning the individuals. Therefore, in anonymizing personal information, it is desirable to ensure security of personal information in terms of identifiability and the like.
Techniques concerning protection of electronic data personal information targeting text data are disclosed in Japanese Patent No. 3578450 (hereinafter, referred to as Patent Document 1) and Japanese Patent Laid-open Publication No. 2002-269081 (hereinafter, referred to as Patent Document 2).
Patent Document 1 discloses a technique for converting a real name word in an electronic document into an anonym word using a real name word/anonym word dictionary created in advance.
Patent Document 2 discloses a technique for anonymizing real names and surrounding wording highly relevant to the real names using a dictionary and syntactic rules prepared in advance.
In the technique disclosed in Patent Document 1, it is necessary to prepare a dictionary of words to be anonymized. Therefore, in the case of a text for which various forms are assumed, for example, when a form is not specifically decided, it is difficult to store all words to be anonymized as a dictionary.
In the technique disclosed in Patent Document 2, as in the technique disclosed in Patent Document 1, it is necessary to prepare a dictionary of words to be anonymized. Therefore, in the case of a text for which various forms are assumed, for example, when a form is not specifically decided, it is difficult to hold all words to be anonymized as a dictionary. Further, in the technique disclosed in Patent Document 2, appearance probabilities of respective words and surrounding wording including the words are calculated. However, when a combination of a word and surrounding wording including the word is rare, the word cannot be anonymized.